Search Results: "jack"

10 June 2022

Sam Hartman: Flailing to Replace Jack with Pipewire for DJ Audio

I could definitely use some suggestions here, both in terms of things to try or effective places to ask questions about Pipewire audio. The docs are improving, but are still in early stages. Pipewire promises to combine the functionality of PulseAudio and Jack. That would be great for me. I use Jack for my DJ work, and it s somewhat complicated and fragile. However, so far my attempts to replace Jack have been unsuccessful, and I might need to even use PulseAudio instead of Pipewire to get the DJ stuff working correctly. The Setup In the simplest setup I have a DJ controller. It s both a MIDI device and a sound card. It has 4 channel audio, but it s not typical surround sound. Two channels are the main speakers, and two channels are the headphones. Conceptually it might be better to model the controller as two sinks: one for the speakers and one for the headphones. At a hardware level they need to be one device for several reasons, especially including using a common clock. It s really important than only the main mix go out channel 1-2 (the speakers). Random beeps or sound from other applications going out the main speakers is disruptive and unprofessional. However, because I m blind, I need that sound. I especially need the output of Orca (my screen reader) and Emacspeak (another screen reader). So I need that output to go to the headphones. Under Pulse/Jack The DJ card is the Jack primary sound device (system:playback_1 through system:playback_4). I then use themodule-jack-sink Pulse module to connect Pulse to Jack. That becomes the default sink for Pulse, and I link front-left from that sink to system:playback_3. So, I get the system sounds and screen reader mixed into the left channel of my headphones and nowhere else. Enter Pipewire Initially Pipewire sees the DJ card as a 4-channel sound card and assumes it s surround4.0 (so front and rear left and right). It helpfully expands my stereo signal so that everything goes to the front and rear. So, exactly what I don t want to have happen happens: all my system sounds go out the main speakers (channel 1-2). It was easy to override Wireplumber s ALSA configuration and assign different channel positions. I tried assigning something like a1,a2,fl,fr hoping that Pipewire wouldn t mix things into aux channels that weren t part of the typical surround set. No luck. It did correctly reflect the channels in things like pacmd list sinks so my Pipewire config was being applied. But the sound was still wrong. * I tried turning off channelmix.upmix. That didn t help; that appears to be more about mixing stereo into center, rear and LFE. The basic approach of getting a stream to conform to the output node s channels appears to be hurting me here.

Turning off stream.dont-remix actually got stereo sound to do exactly what I wanted. If I use sox to play a stereo MP3 for example, it comes out the headphones and not my speakers. Unfortunately, that didn t help with the accessibility sounds at all. Those are mono in pulse land, and apparently mono is always expanded to all channels.
I didn t try turning off channelmix entirely. I m reasonably sure that would break mono sound entirely, so I d get no accessibility output which would make my computer entirely unusable.
I tried using jack_disconnect to disconnect the accessibility ports from all but the headphones. The accessibility applications aren t actually using Jack, but one of the cool things about Pipewire is that you can use Jack interfaces to manipulate non-Jack applications. Unfortunately, at least the Emacspeak espeak speech server regularly shuts down and restarts its sound connection. So, I get speech through the headphones for a phrase or two, and then it reverts to the default config.

I d love any ideas about how I can get this to work. I m sure it s simple I m just missing the right mental model or knowledge of how to configure things. Pipewire Not Talking to Jack I thought I could at least use Pipewire the same way I use Pulse. Namely, I can run a real jackd and connect up Pipewire to that server. According to the wiki, Pipewire can be a Jack client. It s disabled by default, because you need to make sure that Wireplumber is using the real Jack libraries rather than the Pipewire replacements. That s the case on Debian, so I enabled the feature. A Jack device appeared in wpctl status as did a Jack sink. Using jack_lsp on that device showed it was talking to the Jack server and connected to system:playback_*. Unfortunately, it doesn t work. The sink does not show up in pacmd list sinks, and pipewire-pulse gives an error about it not being ready. If I select it as the default sink in wpctl set-default I get no sound at all, at least from Pulse applications. Versions of things This is all on debian, approximately testing/bookworm or newer for the relevant libraries.

Pipewire 0.3.51-1
Wireplumber 0.4.10-2
pipewire-pulse and libspa0.2-jack are also 0.3.51-1 as you d expect
Jackd2 1.9.17~dfsg-1

comments

18 May 2022

Louis-Philippe V ronneau: Clojure Team 2022 Sprint Report

This is the report for the Debian Clojure Team remote sprint that took place on May 13-14th. Looking at my previous blog entries, this was my first Debian sprint since July 2020! Crazy how fast time flies... Many thanks to those who participated, namely:

Rob Browning (rlb)
Elana Hashman (ehashman)
J r me Charaoui (lavamind)
Leandro Doctors (allentiak)
Louis-Philippe V ronneau (pollo)

Sadly, Utkarsh Gupta although having planned on participating ended up not being able to and worked on DebConf Bursary paperwork instead. rlb Rob mostly worked on creating a dh-clojure tool to help make packaging Clojure libraries easier. At the moment, most of the packaging is done manually, by invoking build tools by hand. Having a tool to automate many of the steps required to build Clojure packages would go a long way in making them more uniform. His work (although still very much a WIP) can be found here: https://salsa.debian.org/rlb/dh-clojure/ ehashman Elana:

Finished the Java Team VCS migration to the Clojure Team namespace.
Worked on updating Leiningen to 2.9.8.
Proposed an upstream dependency update in Leiningen to match Debian's most recent version.
Gave pollo Owner access on the Clojure Team namespace and added lavamind as a Developer.
Uploaded Clojure 1.10.3-1.
Updated sjacket-clojure to version 0.1.1.1 and uploaded it to experimental.
Added build tests to spec-alpha-clojure.
Filed bug #1010995 for missing test dependency for Clojure.
Closed bugs #976151, #992735 and #992736.

lavamind It was J r me's first time working on Clojure packages, and things went great! During the sprint, he:

Joined the Clojure Team on salsa.
Identified missing dependencies to update puppetdb to the 7.x release.
Learned how to package Clojure libraries in Debian.
Packaged murphy-clojure, truss-clojure and encore-clojure and uploaded them to NEW.
Began to package nippy-clojure.

allentiak Leandro joined us on Saturday, since he couldn't get off work on Friday. He mostly continued working on replacing our in-house scripts for /usr/bin/clojure by upstream's, a task he had already started during GSoC 2021. Sadly, none of us were familiar with Debian's mechanism for alternatives. If you (yes you, dear reader) are familiar with it, I'm sure he would warmly welcome feedback on his development branch. pollo As for me, I:

Fixed a classpath bug in core-async-clojure that was breaking other libraries.
Added meaningful autopkgtests to core-async-clojure.
Uploaded new versions of tools-analyzer-clojure and trapperkeeper-clojure with autopkgtests.
Updated pomegranate-clojure and nrepl-clojure to the latest upstream version and revamped the way they were packaged.
Assisted lavamind with Clojure packaging.

Overall, it was quite a productive sprint! Thanks to Debian for sponsoring our food during the sprint. It was nice to be able to concentrate on fixing things instead of making food :) Here's a bonus picture of the nice sushi platter I ended up getting for dinner on Saturday night: Picture of a sushi platter

2 April 2022

Ian Jackson: Otter (game server) 1.0.0 released

I have just released Otter 1.0.0. Recap: what is Otter Otter is my game server for arbitrary board games. Unlike most online game systems. It does not know (nor does it need to know) the rules of the game you are playing. Instead, it lets you and your friends play with common tabletop/boardgame elements such as hands of cards, boards, and so on. So it s something like a tabletop simulator (but it does not have any 3D, or a physics engine, or anything like that). There are provided game materials and templates for Penultima, Mao, and card games in general. Otter also supports uploadable game bundles, which allows users to add support for additional games - and this can be done without programming. For more information, see the online documentation. There are a longer intro and some screenshots in my 2021 introductory blog post about Otter Releasing 1.0.0 I m calling this release 1.0.0 because I think I can now say that its quality, reliability and stability is suitable for general use. In particular, Otter now builds on Stable Rust, which makes it a lot easier to install and maintain. Switching web framework, and async Rust I switched Otter from the Rocket web framework to Actix. There are things to prefer about both systems, and I still have a soft spot for Rocket. But ultimately I needed a framework which was fully released and supported for use with Stable Rust. There are few if any Rust web frameworks that are not async. This is rather a shame. Async Rust is a considerably more awkward programming environment than ordinary non-async Rust. I don t want to digress into a litany of complaints, but suffice it to say that while I really love Rust, my views on async Rust are considerably more mixed. Future plans In the near future I plan to add a couple of features to better support some particular games: currency-like resources, and a better UI for dice-like randomness. In the longer term, Otter s, installation and account management arrangements are rather unsophisticated and un-webby. There is not currently any publicly available instance for you to try it out without installing it on a machine of your own. There s not even any provided binaries: you must built Otter yourself. I hope to be able to improve this situation but it involves dealing with cloud CI and containers and so-on, which can all be rather unpleasant. Users on chiark will find an instance of Otter there.

comments

3 March 2022

Ian Jackson: 3D printed hard case for Fairphone 4

About 4 years ago, I posted about making a 3D printed case for my then-new phone. The FP2 was already a few years old when I got one and by now, some spares are unavailable - which is a problem, because I'm terribly hard on hardware. Indeed, that's why I need a very sturdy case for my phone - a case which can be ablative when necessary. With the arrival of my new Fairphone 4, I've updated my case design. Sadly the FP4 doesn't have a notification LED - I guess we're supposed to be glued to the screen and leaving the phone ignored in a corner unless it lights up is forbidden. But that does at least make the printing simpler, as there's no need for a window for the LED. Source code: https://www.chiark.greenend.org.uk/ucgi/~ianmdlvl/git?p=reprap-play.git;a=blob;f=fairphone4-case.scad;h=1738612c2aafcd4ee4ea6b8d1d14feffeba3b392;hb=629359238b2938366dc6e526d30a2a7ddec5a1b0 And the diagrams (which are part of the source, although I didn't update them for the FP4 changes: https://www.chiark.greenend.org.uk/ucgi/~ianmdlvl/git?p=reprap-diagrams.git;a=tree;f=fairphone-case;h=65f423399cbcfd3cf24265ed3216e6b4c0b26c20;hb=07e1723c88a294d68637bb2ca3eac388d2a0b5d4 ( big pictures )

comments

23 February 2022

Ian Jackson: Rooting an Eos Fairphone 4

Last week I received (finally) my Fairphone 4, supplied with a de-googled operating system, which I had ordered from the E Foundation s shop in December. (I m am very hard on hardware and my venerable Fairphone 2 is really on its last legs.) I expect to have full control over the software on any computing device I own which is as complicated, capable, and therefore, hazardous, as a mobile phone. Unfortunately the Eos image (they prefer to spell it /e/ os , srsly!) doesn t come with a way to get root without taking fairly serious measures including unlocking the bootloader. Unlocking the bootloader wouldn t be desirable for me but I can t live without root. So. I started with these helpful instructions: https://forum.xda-developers.com/t/fairphone-4-root.4376421/ I found the whole process a bit of a trial, and I thought I would write down what I did. But, it s not straightforward, at least for someone like me who only has a dim understanding of all this Android stuff. Unfortunately, due to the number of missteps and restarts, what I actually did is not really a sensible procedure. So here is a retcon of a process I think will work: Unlock the bootloader The E Foundation provide instructions for unlocking the bootloader on a stock FP4, here https://doc.e.foundation/devices/FP4/install and they seem applicable to the Murena phone supplied with Eos pre-installed, too. NB tht unlocking the bootloader wipes the phone. So we do it first. So:

Power on the phone, with no SIM installed
You get a welcome screen.
Skip all things on startup including wifi
Go to the very end of the settings, tap a gazillion times on the phone s version until you re a developer
In the developer settings, allow usb debugging
In the developer settings, allow oem bootloader unlocking
Connect a computer via a USB cable, say yes on phone to USB debugging
adb reboot bootloader
The phone will reboot into a texty kind of screen, the bootloader
fastboot flashing unlock
The phone will reboot, back to the welcome screen
Repeat steps 3-9 (maybe not all are necessary)
fastboot flashing unlock_critical
The phone will reboot, back to the welcome screen

Note that although you are running fastboot, you must run this command with the phone in bootloader mode, not fastboot (aka fastbootd ) mode. If you run fastboot flashing unlcok from fastboot you just get a don t know what you re talking about . I found conflicting instructions on what kind of Vulcan nerve pinches could be used to get into which boot modes, and had poor experiences with those. adb reboot bootloader always worked reliably for me. Some docs say to run fastboot oem unlock; I used flashing. Maybe this depends on the Android tools version. Initial privacy prep and OTA update We want to update the supplied phone OS. The build mine shipped with is too buggy to run Magisk, the application we are going to use to root the phone. (With the pre-installed phone OS, Magisk crashes at the patch boot image step.) But I didn t want to let the phone talk to Google, even for the push notifications registration.

From the welcome screen, skip all things except location, date, time. Notably, do not set up wifi
In settings, microg section
1. turn off cloud messaging
2. turn off google safetynet
3. turn off google registration (NB you must do this after the other two, because their sliders become dysfunctional after you turn google registration off)
4. turn off both location modules
In settings, location section, turn off allowed location for browser and magic earth
Now go into settings and enable wifi, giving it your wifi details
Tell the phone to update its operating system. This is a big download.

Install Magisk, the root manager (As a starting point I used these instructions https://www.xda-developers.com/how-to-install-magisk/ and a lot of random forum posts.) You will need the official boot.img. Bizarrely there doesn t seem to be a way to obtain this from the phone. Instead, you must download it. You can find it by starting at https://doc.e.foundation/devices/FP4/install which links to https://images.ecloud.global/stable/FP4/. At the time of writing, the most recent version, whose version number seemed to correspond to the OS update I installed above, was IMG-e-0.21-r-20220123158735-stable-FP4.zip.

Download the giant zipfile to your computer
Unzip it to extract boot.img
Copy the file to your phone s storage . Eg, via adb: with the phone booted into the main operating system, using USB debugging, adb push boot.img /storage/self/primary/Download.
On the phone, open the browser, and enter https://f-droid.org. Click on the link to install f-droid. You will need to enable installing apps from the browser (follow the provided flow to the settings, change the setting, and then use Back, and you can do the install). If you wish, you can download the f-droid apk separately on a computer, and verify it with pgp.
Using f-droid, install Magisk. You will need to enable installing apps from f-droid. (I installed Magisk from f-droid because 1. I was going to trust f-droid anyway 2. it has a shorter URL than Magisk s.)
Open the Magisk app. Tell Magisk to install (Magisk, not the app). There will be only one option: patch boot file. Tell it to patch the boot.img file from before.
Transfer the magisk_patched-THING.img back to your computer (eg via adb pull).
adb reboot bootloader
fastboot boot magisk_patched-THING.img (again, NB, from bootloader mode, not from fastboot mode)
In Magisk you ll see it shows as installed. But it s not really; you ve just booted from an image with it. Ask to install Magisk with Direct install .

After you have done all this, I believe that each time you do an over-the-air OS update, you must, between installing the update and rebooting the phone, ask Magisk to Install to inactive slot (after OTA) . Presumably if you forget you must do the fastboot boot dance again. After all this, I was able to use tsu in Termux. There s a strange behaviour with the root prompt you get apropos Termux s request for root; I found that it definitely worked if Termux wasn t the foreground app You have to leave the bootloader unlocked. Howwever, as I understand it, the phone s encryption will still prevent an attacker from hoovering the data out of your phone. The bootloader lock is to prevent someone tricking you into entering the decryption passkey into a trojaned device. Other things to change

Probably, after you re done with this, disable installing apps from the Browser. I will install Signal before doing that, since that s not in f-droid because of mutual distrust between the f-droid and Signal folks. The permission is called Install unknown apps .
Turn off instant apps aka open links in apps even if the app is not installed . OMG WTF BBQ.
Turn off wifi scanning even if wifi off . WTF.
I turned off storage manager auto delete, on the grounds that I didn t know what the phone might think of as having been backed up . I can manage my own space use, thanks very much.

There are probably other things to change. I have not yet transferred my Signal account from my old phone. It is possible that Signal will require me to re-enable the google push notifications, but I hope that having disabled them in microg it will be happy to use its own system, as it does on my old phone.

comments

4 February 2022

Ian Jackson: EUDCC QR codes vs NHS Travel barcodes vs TAC Verify

The EU Digital Covid Certificate scheme is a format for (digitally signed) vaccination status certificates. Not only EU countries participate - the UK is now a participant in this scheme. I am currently on my way to go skiing in the French Alps. So I needed a certificate that would be accepted in France. AFAICT the official way to do this is to get the international certificate from the NHS, and take it to a French pharmacy who will convert it into something suitably French. (AIUI the NHS international barcode is the same regardless of whether you get it via the NHS website, the NHS app, or a paper letter. NB that there is one barcode per vaccine dose so you have to get the right one - probably that means your booster since there s a 9 month rule!) I read on an forum somewhere that you could use the French TousAntiCovid app to convert the barcode. So I thought I would try that. The TousAntiCovid is Free Softare and on F-Droid, so I was happy to install and use it for this. I also used the French TAC Verify app to check to see what barcodes were accepted. (I found an official document addressed to French professionals recommending this as an option for verifying the status of visitors to one s establishment.) Unfortunately this involves a googlified phone, but one could use a burner phone or ask a friend who s bitten that bullet already. I discovered that, indeed:

My own NHS QR code is not accepted by TAC Verify
My own NHS QR code can be loaded into TousAntiCovid, and added to my wallet in the app
If I get TousAntiCovid to display that certificate, it shows a visually different QR code which TAC Verify accepts

This made me curious. I used a QR code reader to decode both barcodes. The decodings were identical! A long string of guff starting HC1:. AIUI it is an encoded JWT. But there was a difference in the framing: Binary Eye reported that the NHS barcode used error correction level M (medium, aka 15%). The TousAntiCovid barcode used level L (low, 7%). I had my QR code software regenerate a QR code at level M for the data from the TousAntiCovid code. The result was a QR code which is identical (pixel-wise) to the one from the NHS. So the only difference is the error correction level. Curiously, both L (low, generated by TousAntiCovid, accepted by TAC Verify) and M (medium, generated by NHS, rejected by TAC Verify) are lower than the Q (25") recommended by what I think is the specification. This is all very odd. But the upshot is that I think you can convert the NHS international barcode into something that should work in France simply by passing it through any QR code software to re-encode it at error correction level L (7%). But if you re happy to use the TousAntiCovid app it s probably a good way to store them. I guess I ll find out when I get to France if the converted NHS barcodes work in real establishments. Thanks to the folks behind sanipasse.fr for publishing some helpful backround info and operating a Free Software backed public verification service. Footnote To compare the QR codes pixelwise, I roughly cropped the NHS PDF image using a GUI tool, and then on each of the two images used pnmcrop (to trim the border), pnmscale (to rescale the one-pixel-per-pixel output from Binary Eye) and pnmarith -difference to compare them (producing a pretty squirgly image showing just the pixel edges due to antialiasing).

comments

23 January 2022

Louis-Philippe V ronneau: Goodbye Nexus 5

I've blogged a few times already about my Nexus 5, the Android device I have/had been using for 8 years. Sadly, it died a few weeks ago, when the WiFi chip stopped working. I could probably have attempted a mainboard swap, but at this point, getting a new device seemed like the best choice. In a world where most Android devices are EOL after less than 3 years, it is amazing I was able to keep this device for so long, always running the latest Android version with the latest security patch. The Nexus 5 originally shipped with Android 4.4 and when it broke, I was running Android 11, with the November security patch! I'm very grateful to the FOSS Android community that made this possible, especially the LineageOS community. I've replaced my Nexus 5 by a used Pixel 3a, mostly because of the similar form factor, relatively affordable price and the presence of a headphone jack. Google also makes flashing a custom ROM easy, although I had more trouble with this than I first expected. The first Pixel 3a I bought on eBay was a scam: I ordered an "Open Box" phone and it arrived all scratched¹ and with a broken rear camera. The second one I got (from the Amazon Renewed program) arrived in perfect condition, but happened to be a Verizon model. As I found out, Verizon locks the bootloader on their phones, making it impossible to install LineageOS². The vendor was kind enough to let me return it. As they say, third time's the charm. This time around, I explicitly bought a phone on eBay listed with a unlocked bootloader. I'm very satisfied with my purchase, but all in all, dealing with all the returns and the shipping was exhausting. Hopefully this phone will last as long as my Nexus 5!

There was literally a whole layer missing at the back, as if someone had sanded the phone...
Apparently, and "Unlocked phone" means it is "SIM unlocked", i.e. you can use it with any carrier. What I should have been looking for is a "Factory Unlocked phone", one where the bootloader isn't locked :L

4 January 2022

Russell Coker: Terrorists Inspired by Fiction

The Tom Clancy book Debt of Honor published in August 1994 first introduced the concept of a heavy passenger aircraft being used as a weapon by terrorists against a well defended building. In April 1994 there was an attempt to hijack and deliberately crash FedEx flight 705. It s possible for a book to be changed 4 months before publication, but it seems unlikely that a significant plot point in a series of books was changed in such a small amount of time so it s likely that Tom Clancy got the idea first. There have been other variations on that theme, such as the Yokosuka_MXY-7 Kamakazi flying bomb (known by the Allies as Baka which is Japanese for idiot). But Tom Clancy seemed to pioneer the idea of a commercial passenger jet being subverted for the purpose of ground attack. 7 years after Tom Clancy s book was published the 911 hijackings happened. The TV series Black Mirror first aired in 2011, and the first episode was about terrorists kidnapping a princess and demanding that the UK PM perform an indecent act with a pig for her release. While the plot was a little extreme (the entire series is extreme) the basic concept of sexual extortion based on terrorist acts is something that could be done in real life, and if terrorists were inspired by this they are taking longer than expected to do it. Most democracies seem to end up with two major parties that are closely matched. Even if a government was strict about not negotiating with terrorists it seems likely that terrorists demanding that a politician perform an unusual sex act on TV would change things, supporters would be divided into groups that support and oppose negotiating. Discussions wouldn t be as civil as when the negotiation involves money or freeing prisoners. If an election result was perceived to have been influenced by such terrorism then supporters of the side that lost would claim it to be unfair and reject the result. If the goal of terrorists was to cause chaos then that would be one way of achieving it, and they have had over 10 years to consider this possibility. Are we overdue for a terror attack inspired by Black Mirror?

3 January 2022

Ian Jackson: Debian s approach to Rust - Dependency handling

tl;dr: Faithfully following upstream semver, in Debian package dependencies, is a bad idea. Introduction I have been involved in Debian for a very long time. And I ve been working with Rust for a few years now. Late last year I had cause to try to work on Rust things within Debian. When I did, I found it very difficult. The Debian Rust Team were very helpful. However, the workflow and tooling require very large amounts of manual clerical work - work which it is almost impossible to do correctly since the information required does not exist. I had wanted to package a fairly straightforward program I had written in Rust, partly as a learning exercise. But, unfortunately, after I got stuck in, it looked to me like the effort would be wildly greater than I was prepared for, so I gave up. Since then I ve been thinking about what I learned about how Rust is packaged in Debian. I think I can see how to fix some of the problems. Although I don t want to go charging in and try to tell everyone how to do things, I felt I ought at least to write up my ideas. Hence this blog post, which may become the first of a series. This post is going to be about semver handling. I see problems with other aspects of dependency handling and source code management and traceability as well, and of course if my ideas find favour in principle, there are a lot of details that need to be worked out, including some kind of transition plan. How Debian packages Rust, and build vs runtime dependencies Today I will be discussing almost entirely build-dependencies; Rust doesn t (yet?) support dynamic linking, so built Rust binaries don t have Rusty dependencies. However, things are a bit confusing because even the Debian binary packages for Rust libraries contain pure source code. So for a Rust library package, building the Debian binary package from the Debian source package does not involve running the Rust compiler; it s just file-copying and format conversion. The library s Rust dependencies do not need to be installed on the build machine for this. So I m mostly going to be talking about Depends fields, which are Debian s way of talking about runtime dependencies, even though they are used only at build-time. The way this works is that some ultimate leaf package (which is supposed to produce actual executable code) Build-Depends on the libraries it needs, and those Depends on their under-libraries, so that everything needed is installed. What do dependencies mean and what are they for anyway? In systems where packages declare dependencies on other packages, it generally becomes necessary to support versioned dependencies. In all but the most simple systems, this involves an ordering (or similar) on version numbers and a way for a package A to specify that it depends on certain versions of B. Both Debian and Rust have this. Rust upstream crates have version numbers and can specify their dependencies according to semver. Debian s dependency system can represent that. So it was natural for the designers of the scheme for packaging Rust code in Debian to simply translate the Rust version dependencies to Debian ones. However, while the two dependency schemes seem equivalent in the abstract, their concrete real-world semantics are totally different. These different package management systems have different practices and different meanings for dependencies. (Interestingly, the Python world also has debates about the meaning and proper use of dependency versions.) The epistemological problem Consider some package A which is known to depend on B. In general, it is not trivial to know which versions of B will be satisfactory. I.e., whether a new B, with potentially-breaking changes, will actually break A. Sometimes tooling can be used which calculates this (eg, the Debian shlibdeps system for runtime dependencies) but this is unusual - especially for build-time dependencies. Which versions of B are OK can normally only be discovered by a human consideration of changelogs etc., or by having a computer try particular combinations. Few ecosystems with dependencies, in the Free Software community at least, make an attempt to precisely calculate the versions of B that are actually required to build some A. So it turns out that there are three cases for a particular combination of A and B: it is believed to work; it is known not to work; and: it is not known whether it will work. And, I am not aware of any dependency system that has an explicit machine-readable representation for the unknown state, so that they can say something like A is known to depend on B; versions of B before v1 are known to break; version v2 is known to work . (Sometimes statements like that can be found in human-readable docs.) That leaves two possibilities for the semantics of a dependency A depends B, version(s) V..W: Precise: A will definitely work if B matches V..W, and Optimistic: We have no reason to think B breaks with any of V..W. At first sight the latter does not seem useful, since how would the package manager find a working combination? Taking Debian as an example, which uses optimistic version dependencies, the answer is as follows: The primary information about what package versions to use is not only the dependencies, but mostly in which Debian release is being targeted. (Other systems using optimistic version dependencies could use the date of the build, i.e. use only packages that are current .)

	Precise	Optimistic
People involved in version management	Package developers, downstream developers/users.	Package developers, downstream developer/users, distribution QA and release managers.
Package developers declare versions V and dependency ranges V..W so that	It definitely works.	A wide range of B can satisfy the declared requirement.
The principal version data used by the package manager	Only dependency versions.	Contextual, eg, Releases - set(s) of packages available.
Version dependencies are for	Selecting working combinations (out of all that ever existed).	Sequencing (ordering) of updates; QA.
Expected use pattern by a downstream	Downstream can combine any declared-good combination.	Use a particular release of the whole system. Mixing-and-matching requires additional QA and remedial work.
Downstreams are protected from breakage by	Pessimistically updating versions and dependencies whenever anything might go wrong.	Whole-release QA.
A substantial deployment will typically contain	Multiple versions of many packages.	A single version of each package, except where there are actual incompatibilities which are too hard to fix.
Package updates are driven by	Top-down: Depending package updates the declared metadata.	Bottom-up: Depended-on package is updated in the repository for the work-in-progress release.

So, while Rust and Debian have systems that look superficially similar, they contain fundamentally different kinds of information. Simply representing the Rust versions directly into Debian doesn t work. What is currently done by the Debian Rust Team is to manually patch the dependency specifications, to relax them. This is very labour-intensive, and there is little automation supporting either decisionmaking or actually applying the resulting changes. What to do Desired end goal To update a Rust package in Debian, that many things depend on, one need simply update that package. Debian s sophisticated build and CI infrastructure will try building all the reverse-dependencies against the new version. Packages that actually fail against the new dependency are flagged as suffering from release-critical problems. Debian Rust developers then update those other packages too. If the problems turn out to be too difficult, it is possible to roll back. If a problem with a depending packages is not resolved in a timely fashion, priority is given to updating core packages, and the depending package falls by the wayside (since it is empirically unmaintainable, given available effort). There is no routine manual patching of dependency metadata (or of anything else). Radical proposal Debian should not precisely follow upstream Rust semver dependency information. Instead, Debian should optimistically try the combinations of packages that we want to have. The resulting breakages will be discovered by automated QA; they will have to be fixed by manual intervention of some kind, but usually, simply updating the depending package will be sufficient. This no longer ensures (unlike the upstream Rust scheme) that the result is expected to build and work if the dependencies are satisfied. But as discussed, we don t really need that property in Debian. More important is the new property we gain: that we are able to mix and match versions that we find work in practice, without a great deal of manual effort. Or to put it another way, in Debian we should do as a Rust upstream maintainer does when they do the regular update dependencies for new semvers task: we should update everything, see what breaks, and fix those. (In theory a Rust upstream package maintainer is supposed to do some additional checks or something. But the practices are not standardised and any checks one does almost never reveal anything untoward, so in practice I think many Rust upstreams just update and see what happens. The Rust upstream community has other mechanisms - often, reactive ones - to deal with any problems. Debian should subscribe to those same information sources, eg RustSec.) Nobbling cargo Somehow, when cargo is run to build Rust things against these Debian packages, cargo s dependency system will have to be overridden so that the version of the package that is actually selected by Debian s package manager is used by cargo without complaint. We probably don t want to change the Rust version numbers of Debian Rust library packages, so this should be done by either presenting cargo with an automatically-massaged Cargo.toml where the dependency version restrictions are relaxed, or by using a modified version of cargo which has special option(s) to relax certain dependencies. Handling breakage Rust packages in Debian should already be provided with autopkgtests so that ci.debian.net will detect build breakages. Build breakages will stop the updated dependency from migrating to the work-in-progress release, Debian testing. To resolve this, and allow forward progress, we will usually upload a new version of the dependency containing an appropriate Breaks, and either file an RC bug against the depending package, or update it. This can be done after the upload of the base package. Thus, resolution of breakage due to incompatibilities will be done collaboratively within the Debian archive, rather than ad-hoc locally. And it can be done without blocking. My proposal prioritises the ability to make progress in the core, over stability and in particular over retaining leaf packages. This is not Debian s usual approach but given the Rust ecosystem s practical attitudes to API design, versioning, etc., I think the instability will be manageable. In practice fixing leaf packages is not usually really that hard, but it s still work and the question is what happens if the work doesn t get done. After all we are always a shortage of effort - and we probably still will be, even if we get rid of the makework clerical work of patching dependency versions everywhere (so that usually no work is needed on depending packages). Exceptions to the one-version rule There will have to be some packages that we need to keep multiple versions of. We won t want to update every depending package manually when this happens. Instead, we ll probably want to set a version number split: rdepends which want version <X will get the old one. Details - a sketch I m going to sketch out some of the details of a scheme I think would work. But I haven t thought this through fully. This is still mostly at the handwaving stage. If my ideas find favour, we ll have to do some detailed review and consider a whole bunch of edge cases I m glossing over. The dependency specification consists of two halves: the depending .deb s Depends (or, for a leaf package, Build-Depends) and the base .deb Version and perhaps Breaks and Provides. Even though libraries vastly outnumber leaf packages, we still want to avoid updating leaf Debian source packages simply to bump dependencies. Dependency encoding proposal Compared to the existing scheme, I suggest we implement the dependency relaxation by changing the depended-on package, rather than the depending one. So we retain roughly the existing semver translation for Depends fields. But we drop all local patching of dependency versions. Into every library source package we insert a new Debian-specific metadata file declaring the earliest version that we uploaded. When we translate a library source package to a .deb, the binary package build adds Provides for every previous version. The effect is that when one updates a base package, the usual behaviour is to simply try to use it to satisfy everything that depends on that base package. The Debian CI will report the build or test failures of all the depending packages which the API changes broke. We will have a choice, then: Breakage handling - update broken depending packages individually If there are only a few packages that are broken, for each broken dependency, we add an appropriate Breaks to the base binary package. (The version field in the Breaks should be chosen narrowly, so that it is possible to resolve it without changing the major version of the dependency, eg by making a minor source change.) When can then do one of the following:

Update the dependency from upstream, to a version which works with the new base. (Assuming there is one.) This should be the usual response.
Fix the dependency source code so that builds and works with the new base package. If this wasn t just a backport of an upstream change, we should send our fix upstream. (We should prefer to update the whole package, than to backport an API adjustment.)
File an RC bug against the dependency (which will eventually trigger autoremoval), or preemptively ask for the Debian release managers to remove the dependency from the work-in-progress release.

Breakage handling - declare new incompatible API in Debian If the API changes are widespread and many dependencies are affected, we should represent this by changing the in-Debian-source-package metadata to arrange for fewer Provides lines to be generated - withdrawing the Provides lines for earlier APIs. Hopefully examination of the upstream changelog will show what the main compat break is, and therefore tell us which Provides we still want to retain. This is like declaring Breaks for all the rdepends. We should do it if many rdepends are affected. Then, for each rdependency, we must choose one of the responses in the bullet points above. In practice this will often be a mass bug filing campaign, or large update campaign. Breakage handling - multiple versions Sometimes there will be a big API rewrite in some package, and we can t easily update all of the rdependencies because the upstream ecosystem is fragmented and the work involved in reconciling it all is too substantial. When this happens we will bite the bullet and include multiple versions of the base package in Debian. The old version will become a new source package with a version number in its name. This is analogous to how key C/C++ libraries are handled. Downsides of this scheme The first obvious downside is that assembling some arbitrary set of Debian Rust library packages, that satisfy the dependencies declared by Debian, is no longer necessarily going to work. The combinations that Debian has tested - Debian releases - will work, though. And at least, any breakage will affect only people building Rust code using Debian-supplied libraries. Another less obvious problem is that because there is no such thing as Build-Breaks (in a Debian binary package), the per-package update scheme may result in no way to declare that a particular library update breaks the build of a particular leaf package. In other words, old source packages might no longer build when exposed to newer versions of their build-dependencies, taken from a newer Debian release. This is a thing that already happens in Debian, with source packages in other languages, though. Semver violation I am proposing that Debian should routinely compile Rust packages against dependencies in violation of the declared semver, and ship the results to Debian s millions of users. This sounds quite alarming! But I think it will not in fact lead to shipping bad binaries, for the following reasons: The Rust community strongly values safety (in a broad sense) in its APIs. An API which is merely capable of insecure (or other seriously bad) use is generally considered to be wrong. For example, such situations are regarded as vulnerabilities by the RustSec project, even if there is no suggestion that any actually-broken caller source code exists, let alone that actually-broken compiled code is likely. The Rust community also values alerting programmers to problems. Nontrivial semantic changes to APIs are typically accompanied not merely by a semver bump, but also by changes to names or types, precisely to ensure that broken combinations of code do not compile. Or to look at it another way, in Debian we would simply be doing what many Rust upstream developers routinely do: bump the versions of their dependencies, and throw it at the wall and hope it sticks. We can mitigate the risks the same way a Rust upstream maintainer would: when updating a package we should of course review the upstream changelog for any gotchas. We should look at RustSec and other upstream ecosystem tracking and authorship information. Difficulties for another day As I said, I see some other issues with Rust in Debian.

I think the library feature flag encoding scheme is unnecessary. I hope to explain this in a future essay.
I found Debian s approach to handling the source code for its Rust packages quite awkward; and, it has some troubling properties. Again, I hope to write about this later.
I get the impression that updating rustc in Debian is a very difficult process. I haven t worked on this myself and I don t feel qualified to have opinions about it. I hope others are thinking about how to make things easier.

Thanks all for your attention!

comments

7 December 2021

Evgeni Golov: The Mocking will continue, until CI improves

One might think, this blog is exclusively about weird language behavior and yelling at computers Well, welcome to another episode of Jackass! Today's opponent is Ruby, or maybe minitest , or maybe Mocha. I'm not exactly sure, but it was a rather amusing exercise and I like to share my nightmares ;) It all started with the classical "you're using old and unmaintained software, please switch to something new". The first attempt was to switch from the ci_reporter_minitest plugin to the minitest-ci plugin. While the change worked great for Foreman itself, it broke the reporting in Katello - the tests would run but no junit.xml was generated and Jenkins rightfully complained that it got no test results. While investigating what the hell was wrong, we realized that Katello was already using a minitest reporting plugin: minitest-reporters. Loading two different reporting plugins seemed like a good source for problems, so I tried using the same plugin for Foreman too. Guess what? After a bit of massaging (mostly to disable the second minitest-reporters initialization in Katello) reporting of test results from Katello started to work like a charm. But now the Foreman tests started to fail. Not fail to report, fail to actually run. WTH The failure was quite interesting too:

test/unit/parameter_filter_test.rb:5:in  block in <class:ParameterFilterTest>':
  Mocha methods cannot be used outside the context of a test (Mocha::NotInitializedError)

Yes, this is a single test file failing, all others were fine. The failing code doesn't look problematic on first glance:

require 'test_helper'
class ParameterFilterTest < ActiveSupport::TestCase
  let(:klass) do
    mock('Example').tap do  k 
      k.stubs(:name).returns('Example')
    end
  end
  test 'something' do
    something
  end
end

The failing line (5) is mock('Example').tap and for some reason Mocha thinks it's not initialized here. This certainly has something to do with how the various reporting plugins inject themselves, but I really didn't want to debug how to run two reporting plugins in parallel (which, as you remember, didn't expose this behavior). So the only real path forward was to debug what's happening here. Calling the test on its own, with one of the working reporter was the first step:

$ bundle exec rake test TEST=test/unit/parameter_filter_test.rb TESTOPTS=-v
 
#<Mocha::Mock:0x0000557bf1f22e30>#test_0001_permits plugin-added attribute = 0.04 s = .
#<Mocha::Mock:0x0000557bf12cf750>#test_0002_permits plugin-added attributes from blocks = 0.49 s = .

Wait, what? #<Mocha::Mock: >? Shouldn't this read more like ParameterFilterTest:: as it happens for every single other test in our test suite? It definitely should! That's actually great, as it tells us that there is really something wrong with the test and the change of the reporting plugin just makes it worse. What comes next is sheer luck. Well, that, and years of experience in yelling at computers. We use let(:klass) to define an object called klass and this object is a Mocha::Mock that we'll use in our tests later. Now klass is a very common term in Ruby when talking about classes and needing to store them mostly because one can't use class which is a keyword. Is something else in the stack using klass and our let is overriding that, making this whole thing explode? It was! The moment we replaced klass with klass1 (silly, I know, but there also was a klass2 in that code, so it did fit), things started to work nicely. I really liked Tomer's comment in the PR: "no idea why, but I am not going to dig into mocha to figure that out." Turns out, I couldn't let (HAH!) the code rest and really wanted to understand what happened there. What I didn't want to do is to debug the whole Foreman test stack, because it is massive. So I started to write a minimal reproducer for the issue. All starts with a Gemfile, as we need a few dependencies:

gem 'rake'
gem 'mocha'
gem 'minitest', '~> 5.1', '< 5.11'

Then a Rakefile:

require 'rake/testtask'
Rake::TestTask.new(:test) do  t 
  t.libs << 'test'
  t.test_files = FileList["test/**/*_test.rb"]
end
task :default => :test

And a test! I took the liberty to replace ActiveSupport::TestCase with Minitest::Test, as the test won't be using any Rails features and I wanted to keep my environment minimal.

require 'minitest/autorun'
require 'minitest/spec'
require 'mocha/minitest'
class ParameterFilterTest < Minitest::Test
  extend Minitest::Spec::DSL
  let(:klass) do
    mock('Example').tap do  k 
      k.stubs(:name).returns('Example')
    end
  end
  def test_lol
    assert klass
  end
end

Well, damn, this passed! Is it Rails after all that breaks stuff? Let's add it to the Gemfile!

$ vim Gemfile
$ bundle install
$ bundle exec rake test TESTOPTS=-v
 
#<Mocha::Mock:0x0000564bbfe17e98>#test_lol = 0.00 s = .

Wait, I didn't change anything and it's already failing?! Fuck! I mean, cool! But the test isn't minimal yet. What can we reduce? let is just a fancy, lazy def, right? So instead of let(:klass) we should be able to write def class and achieve a similar outcome and drop that Minitest::Spec.

require 'minitest/autorun'
require 'mocha/minitest'
class ParameterFilterTest < Minitest::Test
  def klass
    mock
  end
  def test_lol
    assert klass
  end
end

$ bundle exec rake test TESTOPTS=-v
 
/home/evgeni/Devel/minitest-wtf/test/parameter_filter_test.rb:5:in  klass': Mocha methods cannot be used outside the context of a test (Mocha::NotInitializedError)
    from /home/evgeni/Devel/minitest-wtf/vendor/bundle/ruby/3.0.0/gems/railties-6.1.4.1/lib/rails/test_unit/reporter.rb:68:in  format_line'
    from /home/evgeni/Devel/minitest-wtf/vendor/bundle/ruby/3.0.0/gems/railties-6.1.4.1/lib/rails/test_unit/reporter.rb:15:in  record'
    from /home/evgeni/Devel/minitest-wtf/vendor/bundle/ruby/3.0.0/gems/minitest-5.10.3/lib/minitest.rb:682:in  block in record'
    from /home/evgeni/Devel/minitest-wtf/vendor/bundle/ruby/3.0.0/gems/minitest-5.10.3/lib/minitest.rb:681:in  each'
    from /home/evgeni/Devel/minitest-wtf/vendor/bundle/ruby/3.0.0/gems/minitest-5.10.3/lib/minitest.rb:681:in  record'
    from /home/evgeni/Devel/minitest-wtf/vendor/bundle/ruby/3.0.0/gems/minitest-5.10.3/lib/minitest.rb:324:in  run_one_method'
    from /home/evgeni/Devel/minitest-wtf/vendor/bundle/ruby/3.0.0/gems/minitest-5.10.3/lib/minitest.rb:311:in  block (2 levels) in run'
    from /home/evgeni/Devel/minitest-wtf/vendor/bundle/ruby/3.0.0/gems/minitest-5.10.3/lib/minitest.rb:310:in  each'
    from /home/evgeni/Devel/minitest-wtf/vendor/bundle/ruby/3.0.0/gems/minitest-5.10.3/lib/minitest.rb:310:in  block in run'
    from /home/evgeni/Devel/minitest-wtf/vendor/bundle/ruby/3.0.0/gems/minitest-5.10.3/lib/minitest.rb:350:in  on_signal'
    from /home/evgeni/Devel/minitest-wtf/vendor/bundle/ruby/3.0.0/gems/minitest-5.10.3/lib/minitest.rb:337:in  with_info_handler'
    from /home/evgeni/Devel/minitest-wtf/vendor/bundle/ruby/3.0.0/gems/minitest-5.10.3/lib/minitest.rb:309:in  run'
    from /home/evgeni/Devel/minitest-wtf/vendor/bundle/ruby/3.0.0/gems/minitest-5.10.3/lib/minitest.rb:159:in  block in __run'
    from /home/evgeni/Devel/minitest-wtf/vendor/bundle/ruby/3.0.0/gems/minitest-5.10.3/lib/minitest.rb:159:in  map'
    from /home/evgeni/Devel/minitest-wtf/vendor/bundle/ruby/3.0.0/gems/minitest-5.10.3/lib/minitest.rb:159:in  __run'
    from /home/evgeni/Devel/minitest-wtf/vendor/bundle/ruby/3.0.0/gems/minitest-5.10.3/lib/minitest.rb:136:in  run'
    from /home/evgeni/Devel/minitest-wtf/vendor/bundle/ruby/3.0.0/gems/minitest-5.10.3/lib/minitest.rb:63:in  block in autorun'
rake aborted!

Oh nice, this is even better! Instead of the mangled class name, we now get the very same error the Foreman tests aborted with, plus a nice stack trace! But wait, why is it pointing at railties? We're not loading that! Anyways, lets look at railties-6.1.4.1/lib/rails/test_unit/reporter.rb, line 68

def format_line(result)
  klass = result.respond_to?(:klass) ? result.klass : result.class
  "%s#%s = %.2f s = %s" % [klass, result.name, result.time, result.result_code]
end

Heh, this is touching result.klass, which we just messed up. Nice! But quickly back to railties What if we only add that to the Gemfile, not full blown Rails?

gem 'railties'
gem 'rake'
gem 'mocha'
gem 'minitest', '~> 5.1', '< 5.11'

Yepp, same failure. Also happens with require => false added to the line, so it seems railties somehow injects itself into rake even if nothing is using it?! "Cool"! By the way, why are we still pinning minitest to < 5.11? Oh right, this was the original reason to look into that whole topic. And, uh, it's pointing at klass there already! 4 years ago! So lets remove that boundary and funny enough, now tests are passing again, even if we use klass! Minitest 5.11 changed how Minitest::Test is structured, and seems not to rely on klass at that point anymore. And I guess Rails also changed a bit since the original pin was put in place four years ago. I didn't want to go another rabbit hole, finding out what changed in Rails, but I did try with 5.0 (well, 5.0.7.2) to be precise, and the output with newer (>= 5.11) Minitest was interesting:

$ bundle exec rake test TESTOPTS=-v
 
Minitest::Result#test_lol = 0.00 s = .

It's leaking Minitest::Result as klass now, instead of Mocha::Mock. So probably something along these lines was broken 4 years ago and triggered this pin. What do we learn from that?

klass is cursed and shouldn't be used in places where inheritance and tooling might decide to use it for some reason
inheritance is cursed - why the heck are implementation details of Minitest leaking inside my tests?!
tooling is cursed - why is railties injecting stuff when I didn't ask it to?!
dependency pinning is cursed - at least if you pin to avoid an issue and then forget about said issue for four years
I like cursed things!

29 November 2021

Evgeni Golov: Getting access to somebody else's Ansible Galaxy namespace

TL;DR: adding features after the fact is hard, normalizing names is hard, it's patched, carry on. I promise, the longer version is more interesting and fun to read! Recently, I was poking around Ansible Galaxy and almost accidentally got access to someone else's namespace. I was actually looking for something completely different, but accidental finds are the best ones! If you're asking yourself: "what the heck is he talking about?!", let's slow down for a moment:

Ansible is a great automation engine built around the concept of modules that do things (mostly written in Python) and playbooks (mostly written in YAML) that tell which things to do
Ansible Galaxy is a place where people can share their playbooks and modules for others to reuse
Galaxy Namespaces are a way to allow users to distinguish who published what and reduce name clashes to a minimum

That means that if I ever want to share how to automate installing vim, I can publish evgeni.vim on Galaxy and other people can download that and use it. And if my evil twin wants their vim recipe published, it will end up being called evilme.vim. Thus while both recipes are called vim they can coexist, can be downloaded to the same machine, and used independently. How do you get a namespace? It's automatically created for you when you login for the first time. After that you can manage it, you can upload content, allow others to upload content and other things. You can also request additional namespaces, this is useful if you want one for an Organization or similar entities, which don't have a login for Galaxy. Apropos login, Galaxy uses GitHub for authentication, so you don't have to store yet another password, just smash that octocat! Did anyone actually click on those links above? If you did (you didn't, right?), you might have noticed another section in that document: Namespace Limitations. That says:

Namespace names in Galaxy are limited to lowercase word characters (i.e., a-z, 0-9) and _ , must have a minimum length of 2 characters, and cannot start with an _ . No other characters are allowed, including . , - , and space. The first time you log into Galaxy, the server will create a Namespace for you, if one does not already exist, by converting your username to lowercase, and replacing any - characters with _ .

For my login evgeni this is pretty boring, as the generated namespace is also evgeni. But for the GitHub user Evil-Pwnwil-666 it will become evil_pwnwil_666. This can be a bit confusing. Another confusing thing is that Galaxy supports two types of content: roles and collections, but namespaces are only for collections! So it is Evil-Pwnwil-666.vim if it's a role, but evil_pwnwil_666.vim if it's a collection. I think part of this split is because collections were added much later and have a much more well thought design of both the artifact itself and its delivery mechanisms. This is by the way very important for us! Due to the fact that collections (and namespaces!) were added later, there must be code that ensures that users who were created before also get a namespace. Galaxy does this (and I would have done it the same way) by hooking into the login process, and after the user is logged in it checks if a Namespace exists and if not it creates one and sets proper permissions. And this is also exactly where the issue was! The old code looked like this:

    # Create lowercase namespace if case insensitive search does not find match
    qs = models.Namespace.objects.filter(
        name__iexact=sanitized_username).order_by('name')
    if qs.exists():
        namespace = qs[0]
    else:
        namespace = models.Namespace.objects.create(**ns_defaults)
    namespace.owners.add(user)

See how namespace.owners.add is always called? Even if the namespace already existed? Yepp! But how can we exploit that? Any user either already has a namespace (and owns it) or doesn't have one that could be owned. And given users are tied to GitHub accounts, there is no way to confuse Galaxy here. Now, remember how I said one could request additional namespaces, for organizations and stuff? Those will have owners, but the namespace name might not correspond to an existing user! So all we need is to find an existing Galaxy namespace that is not a "default" namespace (aka a specially requested one) and get a GitHub account that (after the funny name conversion) matches the namespace name. Thankfully Galaxy has an API, so I could dump all existing namespaces and their owners. Next I filtered that list to have only namespaces where the owner list doesn't contain a username that would (after conversion) match the namespace name. I found a few. And for one of them (let's call it the_target), the corresponding GitHub username (the-target) was available! Jackpot! I've registered a new GitHub account with that name, logged in to Galaxy and had access to the previously found namespace. This felt like sufficient proof that my attack worked and I mailed my findings to the Ansible Security team. The issue was fixed in d4f84d3400f887a26a9032687a06dd263029bde3 by moving the namespace.owners.add call to the "new namespace" branch. And this concludes the story of how I accidentally got access to someone else's Galaxy namespace (which was revoked after the report, no worries).

20 October 2021

Ian Jackson: Going to work for the Tor Project

I have accepted a job with the Tor Project. I joined XenSource to work on Xen in late 2007, as XenSource was being acquired by Citrix. So I have been at Citrix for about 14 years. I have really enjoyed working on Xen. There have been a variety of great people. I'm very proud of some of the things we built and achieved. I'm particularly proud of being part of a community that has provided the space for some of my excellent colleagues to really grow. But the opportunity to go and work on a project I am so ideologically aligned with, and to work with Rust full-time, is too good to resist. That Tor is mostly written in C is quite terrifying, and I'm very happy that I'm going to help fix that!

comments

29 September 2021

Ian Jackson: Rust for the Polyglot Programmer

Rust is definitely in the news. I'm definitely on the bandwagon. (To me it feels like I've been wanting something like Rust for many years.) There're a huge number of intro tutorials, and of course there's the Rust Book. A friend observed to me, though, that while there's a lot of "write your first simple Rust program" there's a dearth of material aimed at the programmer who already knows a dozen diverse languages, and is familiar with computer architecture, basic type theory, and so on. Or indeed, for the impatient and confident reader more generally. I thought I would have a go. Rust for the Polyglot Programmer is the result. Compared to much other information about Rust, Rust for the Polyglot Programmer is:

Dense: I assume a lot of starting knowledge. Or to look at it another way: I expect my reader to be able to look up and digest non-Rust-specific words or concepts.
Broad: I cover not just the language and tools, but also the library ecosystem, development approach, community ideology, and so on.
Frank: much material about Rust has a tendency to gloss over or minimise the bad parts. I don't do that. That also frees me to talk about strategies for dealing with the bad parts.
Non-neutral: I'm not afraid to recommend particular libraries, for example. I'm not afraid to extol Rust's virtues in the areas where it does well.
Terse, and sometimes shallow: I often gloss over what I see as unimportant or fiddly details; instead I provide links to appropriate reference materials.

After reading Rust for the Polyglot Programmer, you won't know everything you need to know to use Rust for any project, but should know where to find it. Thanks are due to Simon Tatham, Mark Wooding, Daniel Silverstone, and others, for encouragement, and helpful reviews including important corrections. Particular thanks to Mark Wooding for wrestling pandoc and LaTeX into producing a pretty good-looking PDF. Remaining errors are, of course, mine. Comments are welcome of course, via the Dreamwidth comments or Salsa issue or MR. (If you're making a contribution, please indicate your agreement with the Developer Certificate of Origin.)

edited 2021-09-29 16:58 UTC to fix Salsa link targe, and 17:01 and 17:21 to for minor grammar fixes

comments

22 September 2021

Ian Jackson: Tricky compatibility issue - Rust's io::ErrorKind

This post is about some changes recently made to Rust's ErrorKind, which aims to categorise OS errors in a portable way. Audiences for this post

The educated general reader interested in a case study involving error handling, stability, API design, and/or Rust.
Rust users who have tripped over these changes. If this is you, you can cut to the chase and skip to How to fix.

Background and context Error handling principles Handling different errors differently is often important (although, sadly, often neglected). For example, if a program tries to read its default configuration file, and gets a "file not found" error, it can proceed with its default configuration, knowing that the user hasn't provided a specific config. If it gets some other error, it should probably complain and quit, printing the message from the error (and the filename). Otherwise, if the network fileserver is down (say), the program might erroneously run with the default configuration and do something entirely wrong. Rust's portability aims The Rust programming language tries to make it straightforward to write portable code. Portable error handling is always a bit tricky. One of Rust's facilities in this area is std::io::ErrorKind which is an enum which tries to categorise (and, sometimes, enumerate) OS errors. The idea is that a program can check the error kind, and handle the error accordingly. That these ErrorKinds are part of the Rust standard library means that to get this right, you don't need to delve down and get the actual underlying operating system error number, and write separate code for each platform you want to support. You can check whether the error is ErrorKind::NotFound (or whatever). Because ErrorKind is so important in many Rust APIs, some code which isn't really doing an OS call can still have to provide an ErrorKind. For this purpose, Rust provides a special category ErrorKind::Other, which doesn't correspond to any particular OS error. Rust's stability aims and approach Another thing Rust tries to do is keep existing code working. More specifically, Rust tries to:

Avoid making changes which would contradict the previously-published documentation of Rust's language and features.
Tell you if you accidentally rely on properties which are not part of the published documentation.

By and large, this has been very successful. It means that if you write code now, and it compiles and runs cleanly, it is quite likely that it will continue work properly in the future, even as the language and ecosystem evolves. This blog post is about a case where Rust failed to do (2), above, and, sadly, it turned out that several people had accidentally relied on something the Rust project definitely intended to change. Furthermore, it was something which needed to change. And the new (corrected) way of using the API is not so obvious. Rust enums, as relevant to io::ErrorKind (Very briefly:) When you have a value which is an io::ErrorKind, you can compare it with specific values:

    if error.kind() == ErrorKind::NotFound   ...

But in Rust it's more usual to write something like this (which you can read like a switch statement):

    match error.kind()  
      ErrorKind::NotFound => use_default_configuration(),
      _ => panic!("could not read config file  :  ", &file, &error),

Here _ means "anything else". Rust insists that match statements are exhaustive, meaning that each one covers all the possibilities. So if you left out the line with the _, it wouldn't compile. Rust enums can also be marked non_exhaustive, which is a declaration by the API designer that they plan to add more kinds. This has been done for ErrorKind, so the _ is mandatory, even if you write out all the possibilities that exist right now: this ensures that if new ErrorKinds appear, they won't stop your code compiling. Improving the error categorisation The set of error categories stabilised in Rust 1.0 was too small. It missed many important kinds of error. This makes writing error-handling code awkward. In any case, we expect to add new error categories occasionally. I set about trying to improve this by proposing new ErrorKinds. This obviously needed considerable community review, which is why it took about 9 months. The trouble with Other and tests Rust has to assign an ErrorKind to every OS error, even ones it doesn't really know about. Until recently, it mapped all errors it didn't understand to ErrorKind::Other - reusing the category for "not an OS error at all". Serious people who write serious code like to have serious tests. In particular, testing error conditions is really important. For example, you might want to test your program's handling of disk full, to make sure it didn't crash, or corrupt files. You would set up some contraption that would simulate a full disk. And then, in your tests, you might check that the error was correct. But until very recently (still now, in Stable Rust), there was no ErrorKind::StorageFull. You would get ErrorKind::Other. If you were diligent you would dig out the OS error code (and check for ENOSPC on Unix, corresponding Windows errors, etc.). But that's tiresome. The more obvious thing to do is to check that the kind is Other. Obvious but wrong. ErrorKind is non_exhaustive, implying that more error kinds will appears, and, naturally, these would more finely categorise previously-Other OS errors. Unfortunately, the documentation note

Errors that are Other now may move to a different or a new ErrorKind variant in the future.

was only added in May 2020. So the wrongness of the "obvious" approach was, itself, not very obvious. And even with that docs note, there was no compiler warning or anything. The unfortunate result is that there is a body of code out there in the world which might break any time an error that was previously Other becomes properly categorised. Furthermore, there was nothing stopping new people writing new obvious-but-wrong code. Chosen solution: Uncategorized The Rust developers wanted an engineered safeguard against the bug of assuming that a particular error shows up as Other. They chose the following solution: There is now a new ErrorKind::Uncategorized which is now used for all OS errors for which there isn't a more specific categorisation. The fallback translation of unknown errors was changed from Other to Uncategorised. This is de jure justified by the fact that this enum has always been marked non_exhaustive. But in practice because this bug wasn't previously detected, there is such code in the wild. That code now breaks (usually, in the form of failing test cases). Usually when Rust starts to detect a particular programming error, it is reported as a new warning, which doesn't break anything. But that's not possible here, because this is a behavioural change. The new ErrorKind::Uncategorized is marked unstable. This makes it impossible to write code on Stable Rust which insists that an error comes out as Uncategorized. So, one cannot now write code that will break when new ErrorKinds are added. That's the intended effect. The downside is that this does break old code, and, worse, it is not as clear as it should be what the fixed code looks like. Alternatives considered and rejected by the Rust developers Not adding more ErrorKinds This was not tenable. The existing set is already too small, and error categorisation is in any case expected to improve over time. Just adding ErrorKinds as had been done before This would mean occasionally breaking test cases (or, possibly, production code) when an error that was previously Other becomes categorised. The broken code would have been "obvious", but de jure wrong, just as it is now, So this option amounts to expecting this broken code to continue to be written and continuing to break it occasionally. Somehow using Rust's Edition system The Rust language has a system to allow language evolution, where code declares its Edition (2015, 2018, 2021). Code from multiple editions can be combined, so that the ecosystem can upgrade gradually. It's not clear how this could be used for ErrorKind, though. Errors have to be passed between code with different editions. If those different editions had different categorisations, the resulting programs would have incoherent and broken error handling. Also some of the schemes for making this change would mean that new ErrorKinds could only be stabilised about once every 3 years, which is far too slow. How to fix code broken by this change Most main-line error handling code already has a fallback case for unknown errors. Simply replacing any occurrence of Other with _ is right. How to fix thorough tests The tricky problem is tests. Typically, a thorough test case wants to check that the error is "precisely as expected" (as far as the test can tell). Now that unknown errors come out as an unstable Uncategorized variant that's not so easy. If the test is expecting an error that is currently not categorised, you want to write code that says "if the error is any of the recognised kinds, call it a test failure". What does "any of the recognised kinds" mean here ? It doesn't meany any of the kinds recognised by the version of the Rust stdlib that is actually in use. That set might get bigger. When the test is compiled and run later, perhaps years later, the error in this test case might indeed be categorised. What you actually mean is "the error must not be any of the kinds which existed when the test was written". IMO therefore the right solution for such a test case is to cut and paste the current list of stable ErrorKinds into your code. This will seem wrong at first glance, because the list in your code and in Rust can get out of step. But when they do get out of step you want your version, not the stdlib's. So freezing the list at a point in time is precisely right. You probably only want to maintain one copy of this list, so put it somewhere central in your codebase's test support machinery. Periodically, you can update the list deliberately - and fix any resulting test failures. Unfortunately this approach is not suggested by the documentation. In theory you could work all this out yourself from first principles, given even the situation prior to May 2020, but it seems unlikely that many people have done so. In particular, cutting and pasting the list of recognised errors would seem very unnatural. Conclusions This was not an easy problem to solve well. I think Rust has done a plausible job given the various constraints, and the result is technically good. It is a shame that this change to make the error handling stability more correct caused the most trouble for the most careful people who write the most thorough tests. I also think the docs could be improved.

edited shortly after posting, and again 2021-09-22 16:11 UTC, to fix HTML slips

comments

15 September 2021

Ian Jackson: Get source to Debian packages only via dgit; "official" git links are beartraps

tl;dr dgit clone sourcepackage gets you the source code, as a git tree, in ./sourcepackage. cd into it and dpkg-buildpackage -uc -b. Do not use: "VCS" links on official Debian web pages like tracker.debian.org; "debcheckout"; searching Debian's gitlab (salsa.debian.org). These are good for Debian experts only. If you use Debian's "official" source git repo links you can easily build a package without Debian's patches applied.[1] This can even mean missing security patches. Or maybe it can't even be built in a normal way (or at all). OMG WTF BBQ, why? It's complicated. There is History. Debian's "most-official" centralised source repository is still the Debian Archive, which is a system based on tarballs and patches. I invented the Debian source package format in 1992/3 and it has been souped up since, but it's still tarballs and patches. This system is, of course, obsolete, now that we have modern version control systems, especially git. Maintainers of Debian packages have invented ways of using git anyway, of course. But this is not standardised. There is a bewildering array of approaches. The most common approach is to maintain git tree containing a pile of *.patch files, which are then often maintained using quilt. Yes, really, many Debian people are still using quilt, despite having git! There is machinery for converting this git tree containing a series of patches, to an "official" source package. If you don't use that machinery, and just build from git, nothing applies the patches. [1] This post was prompted by a conversation with a friend who had wanted to build a Debian package, and didn't know to use dgit. They had got the source from salsa via a link on tracker.d.o, and built .debs without Debian's patches. This not a theoretical unsoundness, but a very real practical risk. Future is not very bright In 2013 at the Debconf in Vaumarcus, Joey Hess, myself, and others, came up with a plan to try to improve this which we thought would be deployable. (Previous attempts had failed.) Crucially, this transition plan does not force change onto any of Debian's many packaging teams, nor onto people doing cross-package maintenance work. I worked on this for quite a while, and at a technical level it is a resounding success. Unfortunately there is a big limitation. At the current stage of the transition, to work at its best, this replacement scheme hopes that maintainers who update a package will use a new upload tool. The new tool fits into their existing Debian git packaging workflow and has some benefits, but it does make things more complicated rather than less (like any transition plan must, during the transitional phase). When maintainers don't use this new tool, the standardised git branch seen by users is a compatibility stub generated from the tarballs-and-patches. So it has the right contents, but useless history. The next step is to allow a maintainer to update a package without dealing with tarballs-and-patches at all. This would be massively more convenient for the maintainer, so an easy sell. And of course it links the tarballs-and-patches to the git history in a proper machine-readable way. We held a "git packaging requirements-gathering session" at the Curitiba Debconf in 2019. I think the DPL's intent was to try to get input into the git workflow design problem. The session was a great success: my existing design was able to meet nearly everyone's needs and wants. The room was obviously keen to see progress. The next stage was to deploy tag2upload. I spoke to various key people at the Debconf and afterwards in 2019 and the code has been basically ready since then. Unfortunately, deployment of tag2upload is mired in politics. It was blocked by a key team because of unfounded security concerns; positive opinions from independent security experts within Debian were disregarded. Of course it is always hard to get a team to agree to something when it's part of a transition plan which treats their systems as an obsolete setup retained for compatibility. Current status If you don't know about Debian's git packaging practices (eg, you have no idea what "patches-unapplied packaging branch without .pc directory" means), and don't want want to learn about them, you must use dgit to obtain the source of Debian packages. There is a lot more information and detailed instructions in dgit-user(7). Hopefully either the maintainer did the best thing, or, if they didn't, you won't need to inspect the history. If you are a Debian maintainer, you should use dgit push-source to do your uploads. This will make sure that users of dgit will see a reasonable git history.

edited 2021-09-15 14:48 Z to fix a typo

comments

8 September 2021

Ian Jackson: Wanted: Rust sync web framework

tl;dr: Please recommend me a high-level Rust server-side web framework which is sync and does not plan to move to an async api. Why Async Rust gives somewhat higher performance. But it is considerably less convenient and ergonomic than using threads for concurrency. Much work is ongoing to improve matters, but I doubt async Rust will ever be as easy to write as sync Rust. "Even" sync multithreaded Rust is very fast (and light on memory use) compared to many things people write web apps in. The vast majority of web applications do not need the additional performance (which is typically a percentage, not a factor). So it is rather disappointing to find that all the review articles I read, and all the web framework authors, seem to have assumed that async is the inevitable future. There should be room for both sync and async. Please let universal use of async not be the inevitable future! What I would like a web framework that provides a sync API (something like Rocket 0.4's API would be ideal) and will remain sync. It should probably use (async) hyper underneath. So far I have not found one single web framework on crates.io that neither is already async nor suggests that its authors intend to move to an async API. Some review articles I found even excluded sync frameworks entirely! Answers in the comments please :-).

comments

2 September 2021

Ian Jackson: partial-borrow: references to restricted views of a Rust struct

tl;dr:
With these two crazy proc-macros you can hand out multipe (perhaps mutable) references to suitable subsets/views of the same struct. Why In Otter I have adopted a style where I try to avoid giving code mutable access that doesn't need it, and try to make mutable access come with some code structures to prevent "oh I forgot a thing" type mistakes. For example, mutable access to a game state is only available in contexts that have to return a value for the updates to send to the players. This makes it harder to forget to send the update. But there is a downside. The game state is inside another struct, an Instance, and much code needs (immutable) access to it. I can't pass both &Instance and &mut GameState because one is inside the other. My workaround involves passing separate references to the other fields of Instance, leading to some functions taking far too many arguments. 14 in one case. (They're all different types so argument ordering mistakes just result in compiler errors talking about arguments 9 and 11 having wrong types, rather than actual bugs.) I felt this problem was purely a restriction arising from limitations of the borrow checker. I thought it might be possible to improve on it. Weeks passed and the question gradually wormed its way into my consciousness. Eventually, I tried some experiments. Encouraged, I persisted. What and how partial-borrow is a Rust library which solves this problem. You sprinkle #[Derive(PartialBorrow)] and partial!(...) and then you can pass a reference which grants mutable access to only some of the fields. You can also pass a reference through which some fields are inaccessible. You can even split a single mut reference into multiple compatible references, for example granting mut access to mutually-nonverlapping subsets. The core type is Struct__Partial (for some Struct). It is a zero-sized type, but we prevent anyone from constructing one. Instead we magic up references to it, always ensuring that they have the same address as some Struct. The fields of Struct__Partial are also ZSTs that exist ony as references, and they Deref to the actual field (subject to compile-type borrow compatibility checking). Soundness and testing partial-borrow is primarily a nontrivial procedural macro which autogenerates reams of unsafe. Of course I think it's sound, but I thought that the last two times before I added a test which demonstrated otherwise. So it might be fairer to say that I have tried to make it sound and that I don't know of any problems... Reasoning about the correctness of macro-generated code is not so easy. One problem is that there is nowhere good to put the kind of soundness arguments you would normally add near uses of unsafe. I decided to solve this by annotating an instance of the macro output. There's a not very complicated script using diff3 to help fold in changes if the macro output changes - merge conflicts there mean a possible re-review of the argument text. Of course I also have test cases that run with miri, and test cases for expected compiler errors for uses that need to be forbidden for soundness. But this is quite hairy and I'm worried that it might be rather "my first insane unsafe contraption". Also the pointer/reference trickery is definitely subtle, and depends heavily on knowing what Rust's aliasing and pointer provenance rules really are. Stacked Borrows is not entirely trivial to reason about in fiddly corner cases. So for now I have only called it 0.1.0 and left a note in the docs. I haven't actually made Otter use it yet but that's the rather more boring software integration part, not the fun "can I do this mad thing" part so I will probably leave that for a rainy day. Possibly a rainy day after someone other than me has looked at partial-borrow (preferably someone who understands Stacked Borrows...). Fun! This was great fun. I even enjoyed writing the docs. The proc-macro programming environment is not entirely straightforward and there are a number of things to watch out for. For my first non-adhoc proc-macro this was, perhaps, ambitious. But you don't learn anything without trying...

edited 2021-09-02 16:28 UTC to fix a typo

comments

17 August 2021

Ian Jackson: Releasing nailing-cargo 1.0.0

Summary I have just tagged nailing-cargo/1.0.0. nailing-cargo is a wrapper around the Rust build tool cargo. nailing-cargo can:

Work around cargo's problem with totally unpublished local crates (by temporally massaging your Cargo.toml)
Make it easier to work in Rust without giving the Rust environment (and every author of every one of your dependencies!) control of your main account. (Privsep, with linkfarming where needed.)
Tweak a few defaults.

Background and history It's not really possible to make a nontrivial Rust project without using cargo. But the build process automatically downloads and executes code from crates.io, which is a minimally-curated repository. I didn't want to expose my main account to that. And, at the time, I was working on a project which for which I was also writing a library as a dependency, and I found that cargo couldn't cope with this unless I were to commit (to my git repository) the path (on my local laptop) of my dependency. I filed some bugs, including about the unpublished crate problem. But also, I was stubborn enough to try to find a workaround that didn't involve committing junk to my git history. The result was a short but horrific shell script. I wrote about this at the time (March 2019). Over the last few years the difficulties I have with cargo have remained un-resolved. I found my interactions with upstream rather discouraging. It didn't seem like I would get anywhere by trying to help improve cargo to better support my needs. So instead I have gradually improved nailing-cargo. It is now a Perl script. It is rather less horrific, and has proper documentation (sorry, JS needed because GitLab). Why Perl ? Rust would have been my language of choice. But I wanted to avoid a chicken-and-egg situation. When you're doing privsep, nailing-cargo has to run in your more privileged environment. I wanted something easy to get going with. nailing-cargo has to contain a TOML parser; and I found a small one, TOML-Tiny, which was good enough as a starting point, and small enough I could bundle it as a git subtree. Perl is nicely fast to start up (nailing-cargo --- true runs in about 170ms on my laptop), and it is easy to write a Perl script that will work on pretty much any Perl installation. Still unsolved: embedding cargo in another build system A number of my projects contain a mixture of Rust code with other languages. Unfortunately, nailing-cargo doesn't help with the problems which arise trying to integrate cargo into another build system. I generally resort to find runes for finding Rust source files that might influence cargo, and stamp files for seeing if I have run it recently enough; and I simply live with the fact that cargo sometimes builds more stuff than I needed it to. Future There are a number of ways nailing-cargo could be improved. Notably, the need to overwrite your actual Cargo.toml is very annoying, even if nailing-cargo puts it back afterwards. A big problem with this is that it means that nailing-cargo has to take a lock, while your cargo rune runs. This effectively prevents using nailing-cargo with long-running processes. Notably, editor integrations like rls and racer. I could perhaps solve this with more linkfarm-juggling, but that wouldn't help in-tree builds and it's hard to keep things up to date. I am considering using LD_PRELOAD trickery or maybe bwrap(1) to "implement" the alternative Cargo.toml feature which was rejected by cargo upstream in 2019 (and again in April when someone else asked). Currently there is no support for using sudo for out-of-tree privsep. This should be easy to add but it needs someone who uses sudo to want it (and to test it!) The documentation has some other dicusssion of limitations, some of which aren't too hard to improve. Patches welcome!

comments

16 July 2021

Jamie McClelland: From Ikiwiki to Hugo

Back in the days of Etch, I converted this blog from Drupal to ikiwiki. I remember being very excited about this brand new concept of static web sites derived from content stored in a version control system. And now over a decade later I ve moved to hugo. I feel some loyalty to ikiwiki and Joey Hess for opening my eyes to the static web site concept. But ultimately I grew tired of splitting my time and energy between learning ikiwiki and hugo, which has been my tool of choice for new projects. When I started getting strange emails that I suspect had something to do with spammers filling out ikiwiki s commenting registration system, I choose to invest my time in switching to hugo over debugging and really understanding how ikiwiki handles user registration. I carefully reviewed anarcat s blog on converting from ikiwiki to hugo and learned about a lot of ikiwiki features I am not using. Wow, it s times like these that I m glad I keep it really simple. Based on the various ikiwiki2hugo python scripts I studied, I eventually wrote a far simpler one tailored to my needs. Also, in what could only be called a desperate act of procrastination combined with a touch of self-hatred (it s been a rough week) I rejected all the commenting options available to me and choose to implement my own in PHP. What?!?! Why would anyone do such a thing? I refer you to my previous sentence about desperate procrastination. And also I know it s fashionable to hate PHP, but honestly as the first programming language I learned, there is something comforting and familiar about it. And, on a more objective level, I can deploy it easily to just about any hosting provider in the world. I don t have to maintain a unicorn service or a nodejs service and make special configuration entries in my web configuration. All I have to do is upload the php files and I m done. Well, I m sure I ll regret this decision. Special thanks to Alexander Bilz for the anatole hugo theme. I choose it via a nearly random click to avoid the rabbit hole of choosing a theme. And, by luck, it has turned out quite well. I only had to override the commento partial theme page to hijack it for my own commenting system s use.

1 June 2021

Russ Allbery: Review: The Horse and His Boy

Review: The Horse and His Boy, by C.S. Lewis

Illustrator:	Pauline Baynes
Series:	Chronicles of Narnia #5
Publisher:	Collier Books
Copyright:	1954
Printing:	1978
ISBN:	0-02-044200-9
Format:	Mass market
Pages:	217

The Horse and His Boy was the fifth published book in the Chronicles of Narnia, but it takes place during the last chapter of The Lion, the Witch and the Wardrobe, in the midst of the golden age of Narnia. It's the only true side story of the series and it doesn't matter much where in sequence you read it, as long as it's after The Lion, the Witch and the Wardrobe and before The Last Battle (which would spoil its ending somewhat). MAJOR SPOILERS BELOW. The Horse and His Boy is also the only book of the series that is not a portal fantasy. The Pevensie kids make an appearance, but as the ruling kings and queens of Narnia, and only as side characters. The protagonists are a boy named Shasta, a girl named Aravis, and horses named Bree and Hwin. Aravis is a Calormene, a native of the desert (and extremely Orientalist, but more on that later) kingdom to the south of Narnia and Archenland. Shasta starts the book as the theoretically adopted son but mostly slave of a Calormene fisherman. The Horse and His Boy is the story of their journey from Calormen north to Archenland and Narnia, just in time to defend Narnia and Archenland from an invasion. This story starts with a great hook. Shasta's owner is hosting a passing Tarkaan, a Calormene lord, and overhears a negotiation to sell Shasta to the Tarkaan as his slave (and, in the process, revealing that he rescued Shasta as an infant from a rowboat next to a dead man). Shasta starts talking to the Tarkaan's horse and is caught by surprise when the horse talks back. He is a Talking Horse from Narnia, kidnapped as a colt, and eager to return to Narnia and the North. He convinces Shasta to attempt to escape with him. This has so much promise. For once, we're offered a story where one of the talking animals of Narnia is at least a co-protagonist and has some agency in the story. Bree takes charge of Shasta, teaches him to ride (or, mostly, how to fall off a horse), and makes most of the early plans. Finally, a story that recognizes that Narnia stories don't have to revolve around the humans! Unfortunately, Bree is an obnoxious, arrogant character. I wanted to like him, but he makes it very hard. This gets even worse when Shasta is thrown together with Aravis, a noble Calormene girl who is escaping an arranged marriage on her own talking mare, Hwin. Bree is a warhorse, Hwin is a lady's riding mare, and Lewis apparently knows absolutely nothing about horses, because every part of Bree's sexist posturing and Hwin's passive meekness is awful and cringe-worthy. I am not a horse person, so will link to Judith Tarr's much more knowledgeable critique at Tor.com, but suffice it to say that mares are not meekly deferential or awed by stallions. If Bree had behaved that way with a real mare, he would have gotten the crap beaten out of him (which might have improved his attitude considerably). As is, we have to put up with rather a lot of Bree's posturing and Hwin (who I liked much better) barely gets a line and acts disturbingly like she was horribly abused. This makes me sad, because I like Bree's character arc. He's spent his whole life being special and different from those around him, and while he wants to escape this country and return home, he's also gotten used to being special. In Narnia, he will just be a normal talking horse. To get everything else he wants, he also has to let go of the idea that he's someone special. If Lewis had done more with this and made Bree a more sympathetic character, this could have been very effective. As written, it only gets a few passing mentions (mostly via Bree being weirdly obsessed with whether talking horses roll) and is therefore overshadowed by Shasta's chosen one story and Bree's own arrogant behavior. The horses aside, this is a passable adventure story with some well-done moments. The two kids and their horses end up in Tashbaan, the huge Calormene capital, where they stumble across the Narnians and Shasta is mistaken for one of their party. Radagast, the prince of Calormen, is proposing marriage to Susan, and the Narnians are in the process of realizing he doesn't plan to take no for an answer. Aravis, meanwhile, has to sneak out of the city via the Tisroc's gardens, which results in her hiding behind a couch as she hears Radagast's plans to invade Archenland and Narnia to take Susan as his bride by force. Once reunited, Shasta, Aravis, and the horses flee across the desert to bring warning to Archenland and then Narnia. Of all the Narnia books, The Horse and His Boy leans the hardest into the personal savior angle of Christianity. Parts of it, such as Shasta's ride over the pass into Narnia, have a strong "Footprints" feel to them. Most of the events of the book are arranged by Aslan, starting with Shasta's early life. Readers of the series will know this when a lion shows up early to herd the horses where they need to go, or when a cat keeps Shasta company in the desert and frightens away jackals. Shasta only understands near the end. I remember this being compelling stuff as a young Christian reader. This personal attention and life shaping from God is pure Christian wish fulfillment of the "God has a plan for your life" variety, even more so than Shasta turning out to be a lost prince. As an adult re-reader, I can see that Lewis is palming the theodicy card rather egregiously. It's great that Aslan was making everything turn out well in the end, but why did he have to scare the kids and horses half to death in the process? They were already eager to do what he wanted, but it's somehow inconceivable that Aslan would simply tell them what to do rather than manipulate them. There's no obvious in-story justification why he couldn't have made the experience much less terrifying. Or, for that matter, prevented Shasta from being kidnapped as an infant in the first place and solved the problem of Radagast in a more direct way. This sort of theology takes as an unexamined assumption that a deity must refuse to use his words and instead do everything in weirdly roundabout and mysterious ways, which makes even less sense in Narnia than in our world given how directly and straightforwardly Aslan has acted in previous books. It was also obvious to me on re-read how unfair Lewis's strict gender roles are to Aravis. She's an excellent rider from the start of the book and has practiced many of the things Shasta struggles to do, but Shasta is the boy and Aravis is the girl, so Aravis has to have girl adventures involving tittering princesses, luxurious baths, and eavesdropping behind couches, whereas Shasta has boy adventures like riding to warn the king or bringing word to Narnia. There's nothing very objectionable about Shasta as a character (unlike Bree), but he has such a generic character arc. The Horse and Her Girl with Aravis and Hwin as protagonists would have been a more interesting story, and would have helpfully complicated the whole Narnia and the North story motive. As for that storyline, wow the racism is strong in this one, starting with the degree that The Horse and His Boy is deeply concerned with people's skin color. Shasta is white, you see, clearly marking him as from the North because all the Calormenes are dark-skinned. (This makes even less sense in this fantasy world than in our world because it's strongly implied in The Magician's Nephew that all the humans in Calormen came from Narnia originally.) The Calormenes all talk like characters from bad translations of the Arabian Nights and are shown as cruel, corrupt slavers with a culture that's a Orientalist mishmash of Arab, Persian, and Chinese stereotypes. Everyone is required to say "may he live forever" after referencing the Tisroc, which is an obvious and crude parody of Islam. This stereotype fest culminates in the incredibly bizarre scene that Aravis overhears, in which the grand vizier literally grovels on the floor while Radagast kicks him and the Tisroc, Radagast's father, talks about how Narnia's freedom offends him and the barbarian kingdom would be more profitable and orderly when conquered. The one point to Lewis's credit is that Aravis is also Calormene, tells stories in the same style, and is still a protagonist and just as acceptable to Aslan as Shasta is. It's not enough to overcome the numerous problems with Lewis's lazy world-building, but it makes me wish even more that Aravis had gotten her own book and more meaningful scenes with Aslan. I had forgotten that Susan appears in this book, although that appearance doesn't add much to the general problem of Susan in Narnia except perhaps to hint at Lewis's later awful choices. She is shown considering marriage to the clearly villainous Radagast, and then only mentioned later with a weird note that she doesn't ride to war despite being the best archer of the four. I will say again that it's truly weird to see the Pevensie kids as (young) adults discussing marriage proposals, international politics, and border wars while remembering they all get dumped back into their previous lives as British schoolkids. This had to have had dramatic effects on their lives that Lewis never showed. (I know, the real answer is that Lewis is writing these books according to childhood imaginary adventure logic, where adventures don't have long-term consequences of that type.) I will also grumble once more at how weirdly ineffectual Narnians are until some human comes to tell them what to do. Calormen is obviously a threat; Susan just escaped from an attempted forced marriage. Archenland is both their southern line of defense and is an ally separated by a mountain pass in a country full of talking eagles, among other obvious messengers. And yet, it falls to Shasta to ride to give warning because he's the human protagonist of the story. Everyone else seems to be too busy with quirky domesticity or endless faux-medieval chivalric parties. The Horse and His Boy was one of my favorites when I was a kid, but reading as an adult I found it much harder to tolerate Bree or read past the blatant racial and cultural stereotyping. The bits with Aslan also felt less magical to me than they did as a kid because I was asking more questions about why Aslan had to do everything in such an opaque and perilous way. It's still not a bad adventure; Aravis is a great character, the bits in Tashbaan are at least memorable, and I still love the Hermit of the Southern March and want to know more about him. But I would rank it below the top tier of Narnia books, alongside Prince Caspian as a book with some great moments and some serious flaws. Followed in original publication order by The Magician's Nephew. Rating: 7 out of 10

Next.

Previous.